Setup

The alignments in this analysis were generated by aligning each library (including technical replicates) to the Zebrafish transcriptome from Ensembl Release 94 (GRCz11) using kallisto (v0.43.1). In addition to the standard transcriptome, the two mutant psen2 transcripts were manually added to the reference.

The corresponding set of gene descriptions were then loaded into R as an EnsDb object using the AnnotationHub() infrastructure. Likewise, the set of transcript descriptions were loaded, with the manual addition of the two novel psen2 mutants.

Gene-level Counts

Gene-level counts were imported using tximport, mapping transcripts to genes. Some genes exist in the primary assembly and on alternate assemblies for specific regions, and these were considered as separate transcripts of the same gene for read summarisation purposes. Transcript counts were thus mapped to genes using the gene symbol (e.g. psen2), instead of the gene id.

Genes were retained for analysis if a CPM > 1 was observed for \(\geq\) 5 samples. This equated to about 31 reads for a gene in at least 5 samples for inclusion in downstream analysis, giving a total of 18,808 of the original genes for DGE analysis.

*Total counts from each library after assigning to genes*

Total counts from each library after assigning to genes

Counts were also processed using the voom transformation using quality weights to allow for analysis using normal-based algorithms. Sample weights ranged between 0.4185 and 1.392, with the most strongly down-weighted being a WT sample.

Transcript-level Counts

Transcript-level counts were imported using catchKallisto() from edgeR in order to utilise the voom transformation on transcript-level counts.

*Sample weights using transcript-level counts, showing near identical patterns to those observed at the gene-level.*

Sample weights using transcript-level counts, showing near identical patterns to those observed at the gene-level.

Genotype checks

*CPM values for each psen2 transcript across all samples.*

CPM values for each psen2 transcript across all samples.

Transcript abundances (using CPM) were calculated for each of the three psen2 transcripts, and showed expected patterns of heterozygous expression for FAD samples and all WT expression for the WT samples. However for sample 8_FS_4, no WT allele was detected which is quite inexplicable, and this sample should be excluded from all analyses. The remaining FS samples showed reduced abundance of the FS transcript, as expected under NMD. No increases in expression of the WT allele were evident, supporting a lack of genetic compensation.

This sample was then removed from all objects.

Data Inspection

The next step was to perform an MDS analysis. However, minimal separation was observed between sample groups, A simple PCA also revealed that the first few principal components capture less of the total variability than might be expected,

MDS plot showing no clear groups within the data. Point sizes indicate sample weights as calculated by voomWithQualityWeights().

First five principal components, showing that the first two only account for 33.7% of the total variance, which is below expectations
  PC1 PC2 PC3 PC4 PC5
Standard deviation 22.34 21.15 17 16.42 15.46
Proportion of Variance 0.1778 0.1594 0.1029 0.09607 0.08509
Cumulative Proportion 0.1778 0.3372 0.4401 0.5362 0.6213

DGE Analysis

Design

Three comparisons were defined with the first two being the difference between the two mutant families and the wild-type samples. The third comparison was defined as being between the two mutant groups.

FS Vs WT

The first analysis was comparing psen2N140fs/+ samples to psen2+/+ samples. A total of 4 genes were potentially detected as differentially expressed using an FDR of 5%. In the following plots, a negative value for logFC corresponds to decreased expression in the heterozygous mutants.

*MD plot for psen2^N140fs/+^ samples compared to psen2^+/+^ samples*

MD plot for psen2N140fs/+ samples compared to psen2+/+ samples

*Volcano plot for psen2^N140fs/+^ samples compared to psen2^+/+^ samples*

Volcano plot for psen2N140fs/+ samples compared to psen2+/+ samples

10 most highly ranked genes in the comparison between psen2N140fs/+ samples and psen2+/+ samples
Symbol logFC AveExpr P.Value FDR
psen2 -0.6211 4.591 2.827e-08 0.0002807
CABZ01035279.1 -9.689 0.4716 2.985e-08 0.0002807
ptcd1 0.8541 2.486 3.256e-06 0.02041
CU179663.1 -0.9028 3.711 9.727e-06 0.04574
BX649405.1 -1.045 2.21 1.713e-05 0.06442
atxn1l -0.6898 2.457 2.681e-05 0.08406
pcnp 0.5006 5.545 7.108e-05 0.191
mhc1zea -0.3211 4.509 0.000125 0.2525
si:ch211-160d14.6 -0.4693 4.582 0.0001371 0.2525
lrrc4ba -0.4685 4.853 0.0001431 0.2525
*Expression patterns for significantly DE genes in the comparison between psen2^N140fs/+^ samples and psen2^+/+^ samples. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.*

Expression patterns for significantly DE genes in the comparison between psen2N140fs/+ samples and psen2+/+ samples. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.

*Expression patterns for the next most highly ranked genes in the comparison between psen2^N140fs/+^ samples and psen2^+/+^ samples, but which are not formally considered as DE. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.*

Expression patterns for the next most highly ranked genes in the comparison between psen2N140fs/+ samples and psen2+/+ samples, but which are not formally considered as DE. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.

FAD Vs WT

The next analysis was comparing psen2T141_L142delinsMISLISV/+ samples to psen2+/+ samples. No genes could be considered as DE using an FDR anywhere up to 50%. In the following plots, a negative value for logFC corresponds to decreased expression in the heterozygous mutants.

*MD plot for psen2^T141_L142delinsMISLISV/+^ samples compared to psen2^+/+^ samples*

MD plot for psen2T141_L142delinsMISLISV/+ samples compared to psen2+/+ samples

*Volcano plot for psen2^T141_L142delinsMISLISV/+^ samples compared to psen2^+/+^ samples*

Volcano plot for psen2T141_L142delinsMISLISV/+ samples compared to psen2+/+ samples

10 most highly ranked genes in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2+/+ samples
Symbol logFC AveExpr P.Value FDR
si:ch73-236c18.2 1.182 2.05 4.181e-05 0.7864
tnk2a 0.2508 5.591 0.0001202 0.9658
BX548026.1 -0.7332 0.9484 0.0002038 0.9658
si:ch211-56a11.2 0.9663 1.25 0.0002288 0.9658
stn1 0.4968 3.078 0.000322 0.9658
BX890543.1 -0.6475 2.308 0.000329 0.9658
si:ch211-15j1.4 0.9632 4.827 0.0004467 0.9658
celsr1b -0.2697 5.413 0.000476 0.9658
EIF1B -0.8818 7.518 0.0005066 0.9658
si:ch211-114l13.4 1.051 2.016 0.0006596 0.9658
*Expression patterns for the 5 most highly ranked genes in the comparison between psen2^T141_L142delinsMISLISV/+^ samples and psen2^+/+^ samples. None were considered as DE. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.*

Expression patterns for the 5 most highly ranked genes in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2+/+ samples. None were considered as DE. Values are given as CPM using an offset of 1 to avoid zeroes, with the y-axis being displayed on the log scale.

FAD Vs FS

The final analysis was comparing psen2T141_L142delinsMISLISV/+ samples to psen2N140fs/+ samples. A total of 6 genes were potentially detected as differentially expressed using an FDR of 5%. In the following plots, a negative value for logFC corresponds to decreased expression in psen2T141_L142delinsMISLISV/+ samples, whilst a positive value for logFC corresponds to increased expression in psen2T141_L142delinsMISLISV/+ samples.

*MD plot for psen2^T141_L142delinsMISLISV/+^ samples compared to psen2^N140fs/+^ samples*

MD plot for psen2T141_L142delinsMISLISV/+ samples compared to psen2N140fs/+ samples

*Volcano plot for psen2^T141_L142delinsMISLISV/+^ samples compared to psen2^N140fs/+^ samples*

Volcano plot for psen2T141_L142delinsMISLISV/+ samples compared to psen2N140fs/+ samples

10 most highly ranked genes in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2N140fs/+ samples
Symbol logFC AveExpr P.Value FDR
psen2 0.7093 4.591 4.095e-09 7.703e-05
CABZ01035279.1 8.766 0.4716 8.259e-08 0.0007767
CU179663.1 0.9465 3.711 4.697e-06 0.02945
si:ch73-236c18.2 1.448 2.05 8.036e-06 0.03778
si:ch211-114l13.3 2.008 -0.1774 1.133e-05 0.04263
si:ch211-114l13.4 1.651 2.016 1.361e-05 0.04267
BX649405.1 0.9933 2.21 2.385e-05 0.06408
CABZ01084501.2 0.6405 3.94 4.722e-05 0.111
mcoln1a -0.5633 4.202 5.725e-05 0.1196
BX649434.3 0.7764 2.85 8.793e-05 0.1654
*Expression patterns for significantly DE genes in the comparison between psen2^T141_L142delinsMISLISV/+^ samples and psen2^N140fs/+^ samples. This is essentially a subset of the previously identified genes*

Expression patterns for significantly DE genes in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2N140fs/+ samples. This is essentially a subset of the previously identified genes

*Expression patterns for the next most highly ranked genes in the comparison between psen2^T141_L142delinsMISLISV/+^ samples and psen2^N140fs/+^ samples, but which are not formally considered as DE*

Expression patterns for the next most highly ranked genes in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2N140fs/+ samples, but which are not formally considered as DE

Similarity between Mutants

Although there were minimal DE genes in the above analysis, the similarity between differential expression values (i.e. logFC) was inspected visually.

*Comparison between mutants showing logFC for each mutant, based on comparison against WT samples. Genes considered statistically significant between mutants are highlighted in red, however several other genes demonstrated either highly similar behaviour, or were suggestive of different behviours. These genes are labelled in grey. Dahed horizontal and vertical lines have been placed at $\pm1$, with the unit line also shown in pale blue.*

Comparison between mutants showing logFC for each mutant, based on comparison against WT samples. Genes considered statistically significant between mutants are highlighted in red, however several other genes demonstrated either highly similar behaviour, or were suggestive of different behviours. These genes are labelled in grey. Dahed horizontal and vertical lines have been placed at \(\pm1\), with the unit line also shown in pale blue.

Expression patterns for genes showing similiarity of apparent differential expression across both mutants when compared to WT samples. In all cases one or more outlier points appears to have impacted the ability for these genes to be considered as DE. With the exception of the first two genes, these outliers samples were not consistent. Jitter has been added to the x-axis.

Expression patterns for genes showing potential differences in apparent differential expression across both mutants when compared to WT samples. In all cases one or more outlier points appears to have impacted the ability for these genes to be considered as DE. With the exception of the second and third genes in the top row, these outliers samples were not consistent. Jitter has been added to the x-axis.

Enrichment Analysis

Hallmark Gene Sets

KEGG Gene Sets

Differential Transcript Expression

As the level of transcript complexity is less in zebrafish than human, and 1:1 mapping between species is less robust, only a brief analysis was performed at the transcript level. In essence, the same genes were found as the most highly ranked, with changes in expression of psen2 transcripts detected as expected, providing a form of positive control. Following the top tables, the basic transcript expression patterns are shown for three possible genes of interest. Notably, the transcripts showing the strongest differential expression are expressed at very low-levels for both si:ch211-132g1.3 and slc37a4b.

10 most highly ranked transcripts in the comparison between psen2N140fs/+ samples and psen2+/+ samples
Transcript Symbol logFC AveExpr P.Value FDR gene_id
ENSDART00000187524 CABZ01035279.1 -8.715 0.2798 1.258e-07 0.003779 ENSDARG00000116774
ENSDART00000137332 si:ch211-132g1.3 -5.549 -1.839 3.027e-07 0.004548 ENSDARG00000089477
psen2N140fs psen2 3.63 -4.436 2.664e-06 0.02669 ENSDARG00000015540
ENSDART00000114613 ptcd1 0.8547 2.524 4.391e-06 0.03299 ENSDARG00000076176
ENSDART00000185608 si:ch211-160d14.6 -6.75 -1.138 8.905e-06 0.05351 ENSDARG00000115710
ENSDART00000188158 BX649405.1 -1.041 2.066 2.743e-05 0.1374 ENSDARG00000112605
ENSDART00000127351 atxn1l -0.6731 2.866 4.159e-05 0.1738 ENSDARG00000086977
ENSDART00000006381 psen2 -0.9806 1.091 4.674e-05 0.1738 ENSDARG00000015540
ENSDART00000182716 actb1 -4.425 -2.592 5.207e-05 0.1738 ENSDARG00000113649
ENSDART00000101586 pcnp 0.8028 2.719 7.057e-05 0.2121 ENSDARG00000037713
10 most highly ranked transcripts in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2+/+ samples
Transcript Symbol logFC AveExpr P.Value FDR gene_id
psen2T141_L142delinsMISLISV psen2 5.931 -3.349 1.13e-11 3.395e-07 ENSDARG00000015540
ENSDART00000006381 psen2 -0.9687 1.091 2.606e-05 0.3916 ENSDARG00000015540
ENSDART00000168762 si:ch73-236c18.2 1.199 0.9262 4.544e-05 0.4551 ENSDARG00000103829
ENSDART00000078192 cnpy4 0.9683 0.3566 9.046e-05 0.6795 ENSDARG00000055797
ENSDART00000133864 gpr143 -0.8098 0.5032 0.0001229 0.7387 ENSDARG00000034572
ENSDART00000168837 fam168b 1.738 2.394 0.0001915 0.866 ENSDARG00000101733
ENSDART00000185486 BX890543.1 -0.671 2.583 0.0002827 0.866 ENSDARG00000114583
ENSDART00000134826 si:ch211-15j1.4 0.9712 4.025 0.0003187 0.866 ENSDARG00000092604
ENSDART00000027624 stn1 0.4981 3.561 0.0003206 0.866 ENSDARG00000007734
ENSDART00000144157 si:ch211-56a11.2 0.9227 1.733 0.000344 0.866 ENSDARG00000093677
10 most highly ranked transcripts in the comparison between psen2T141_L142delinsMISLISV/+ samples and psen2N140fs/+ samples
Transcript Symbol logFC AveExpr P.Value FDR gene_id
psen2T141_L142delinsMISLISV psen2 5.913 -3.349 2.255e-11 6.777e-07 ENSDARG00000015540
ENSDART00000187524 CABZ01035279.1 7.696 0.2798 4.478e-07 0.006728 ENSDARG00000116774
ENSDART00000137332 si:ch211-132g1.3 4.79 -1.839 1.39e-06 0.01392 ENSDARG00000089477
psen2N140fs psen2 -3.635 -4.436 2.056e-06 0.01545 ENSDARG00000015540
ENSDART00000150193 slc37a4b -1.158 -0.06034 3.877e-06 0.0233 ENSDARG00000077180
ENSDART00000168762 si:ch73-236c18.2 1.453 0.9262 9.67e-06 0.04843 ENSDARG00000103829
ENSDART00000141678 si:ch211-114l13.3 1.966 -0.9645 1.788e-05 0.0734 ENSDARG00000094346
ENSDART00000147678 si:dkey-222h21.2 2.015 0.6033 1.954e-05 0.0734 ENSDARG00000094297
ENSDART00000188136 CABZ01084501.2 0.6341 4.423 4.033e-05 0.1226 ENSDARG00000113332
ENSDART00000185608 si:ch211-160d14.6 5.695 -1.138 4.42e-05 0.1226 ENSDARG00000115710
One transcript (ENSDART00000137332) was undetectable in the FS mutants, and this is a non-coding transcript.

One transcript (ENSDART00000137332) was undetectable in the FS mutants, and this is a non-coding transcript.

  seqnames start end width strand tx_id tx_biotype
ENSDART00000131675 1 1874427 1885594 11168 - ENSDART00000131675 protein_coding
ENSDART00000165669 1 1888541 1894722 6182 - ENSDART00000165669 protein_coding
ENSDART00000137332 1 1874008 1885600 11593 - ENSDART00000137332 processed_transcript
ENSDART00000143790 1 1887620 1928122 40503 - ENSDART00000143790 processed_transcript
ENSDART00000147773 1 1890171 1894005 3835 - ENSDART00000147773 processed_transcript

Session Info

R version 3.6.1 (2019-07-05)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ensembldb(v.2.8.0), AnnotationFilter(v.1.8.0), GenomicFeatures(v.1.36.4), AnnotationDbi(v.1.46.0), Biobase(v.2.44.0), GenomicRanges(v.1.36.0), GenomeInfoDb(v.1.20.0), IRanges(v.2.18.1), S4Vectors(v.0.22.0), here(v.0.1), pander(v.0.6.3), fgsea(v.1.10.0), Rcpp(v.1.0.2), ggrepel(v.0.8.1), forcats(v.0.4.0), stringr(v.1.4.0), dplyr(v.0.8.3), purrr(v.0.3.2), readr(v.1.3.1), tidyr(v.0.8.3), tibble(v.2.1.3), ggplot2(v.3.2.0), tidyverse(v.1.2.1), scales(v.1.0.0), magrittr(v.1.5), AnnotationHub(v.2.16.0), BiocFileCache(v.1.8.0), dbplyr(v.1.4.2), BiocGenerics(v.0.30.0), tximport(v.1.12.3), edgeR(v.3.26.5) and limma(v.3.40.4)

loaded via a namespace (and not attached): colorspace(v.1.4-1), rprojroot(v.1.3-2), XVector(v.0.24.0), rstudioapi(v.0.10), bit64(v.0.9-7), interactiveDisplayBase(v.1.22.0), lubridate(v.1.7.4), xml2(v.1.2.0), knitr(v.1.23), zeallot(v.0.1.0), jsonlite(v.1.6), Cairo(v.1.5-10), Rsamtools(v.2.0.0), broom(v.0.5.2), shiny(v.1.3.2), BiocManager(v.1.30.4), compiler(v.3.6.1), httr(v.1.4.0), backports(v.1.1.4), assertthat(v.0.2.1), Matrix(v.1.2-17), lazyeval(v.0.2.2), cli(v.1.1.0), later(v.0.8.0), htmltools(v.0.3.6), prettyunits(v.1.0.2), tools(v.3.6.1), gtable(v.0.3.0), glue(v.1.3.1), GenomeInfoDbData(v.1.2.1), rappdirs(v.0.3.1), fastmatch(v.1.1-0), cellranger(v.1.1.0), vctrs(v.0.2.0), Biostrings(v.2.52.0), nlme(v.3.1-140), rtracklayer(v.1.44.2), crosstalk(v.1.0.0), xfun(v.0.8), rvest(v.0.3.4), mime(v.0.7), XML(v.3.98-1.20), zlibbioc(v.1.30.0), ProtGenerics(v.1.16.0), hms(v.0.5.0), promises(v.1.0.1), SummarizedExperiment(v.1.14.0), rhdf5(v.2.28.0), yaml(v.2.2.0), curl(v.4.0), memoise(v.1.1.0), gridExtra(v.2.3), biomaRt(v.2.40.3), stringi(v.1.4.3), RSQLite(v.2.1.2), highr(v.0.8), BiocParallel(v.1.18.0), rlang(v.0.4.0), pkgconfig(v.2.0.2), bitops(v.1.0-6), matrixStats(v.0.54.0), evaluate(v.0.14), lattice(v.0.20-38), Rhdf5lib(v.1.6.0), htmlwidgets(v.1.3), labeling(v.0.3), GenomicAlignments(v.1.20.1), bit(v.1.1-14), tidyselect(v.0.2.5), R6(v.2.4.0), generics(v.0.0.2), DelayedArray(v.0.10.0), DBI(v.1.0.0), pillar(v.1.4.2), haven(v.2.1.1), withr(v.2.1.2), RCurl(v.1.95-4.12), modelr(v.0.1.4), crayon(v.1.3.4), plotly(v.4.9.0), rmarkdown(v.1.14), progress(v.1.2.2), locfit(v.1.5-9.1), grid(v.3.6.1), readxl(v.1.3.1), data.table(v.1.12.2), blob(v.1.2.0), digest(v.0.6.20), xtable(v.1.8-4), httpuv(v.1.5.1), munsell(v.0.5.0) and viridisLite(v.0.3.0)